Stencil-Aware GPU Optimization of Iterative Solvers
نویسندگان
چکیده
Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newton-Krylov methods, which linearize a finite-difference stencil-based discretization of a problem, producing a sparse matrix with regular structure. Knowledge of this structure can be used to exploit parallelism and locality of reference on modern cache-based multiand manycore architectures, achieving high performance for computations underlying commonly used iterative linear solvers. In this paper we describe our approach to sparse matrix data structure design and our implementation of the kernels underlying iterative linear solvers in PETSc. We also describe autotuning of CUDA implementations based on high-level descriptions of the stencil-based matrix and vector operations.
منابع مشابه
GPU Implementation of Iterative Solvers in Numerical Weather Predicting Models
Numerical weather predicting models often require solving a 3-D Helmholtz problem which derived from the governing equation of dynamical core in Met Office Unified Model, by preconditioned iterative solvers. In this dissertation, a GPU implementation of preconditioned conjugate gradient (CG) iterative method will be focused on. A given serial code has been ported on GPU. According to the portin...
متن کاملEvaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil
Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time-consuming, which greatly limits their performance and power efficiency. In this paper, we accelerate the forward-modeling technique on the latest multi-co...
متن کاملGPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging
We present an algorithm for general sparse matrix-matrix multiplication (SpGEMM) on many-core architectures, such as GPUs. SpGEMM is implemented by iterative row merging, similar to merge sort, except that elements with duplicate column indices are aggregated on the fly. The main kernel merges small numbers of sparse rows at once using sub-warps of threads to realize an early compression effect...
متن کاملOptimizing Stencil Computations: Multicore-optimized Wavefront Diamond Blocking on Shared and Distributed Memory Systems
Iterative Stencil Computations (ISC) appear in wide variety of scientific applications, partial differential equation (PDE) solvers being the most important one. In iterative stencil computations, each point in a multi-dimensional spatial grid is updated using weighted contributions from its neighbor points, defined by the stencil operator. The stencil operator specifies the relative coordinate...
متن کاملGPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs
Spatial blocking is a critical memory-access optimization to efficiently exploit the computing resources of parallel processors, such as many-core GPUs. By reusing cache-loaded data over multiple spatial iterations, spatial blocking can significantly lessen the pressure of accessing slow global memory. Stencil computations, for example, can exploit such data reuse via spatial blocking through t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- SIAM J. Scientific Computing
دوره 35 شماره
صفحات -
تاریخ انتشار 2013